This script will serve as a reference for plotting methods (traditional and trellis). As usual, set up the environment...
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
#Set print width
pd.set_option('line_width',140)
#Enable R capability
%load_ext rmagic
#Set working directory (cloned from Github)
workdir='/home/choct155/u46e_linux/choct155/b7be6eab-3c3c-4b68-8148-1321c1e344ae/home/choct155/analysis/asdar/asdar/'
First, we need to pull in our dataset that we will be using for this exercise.
%%R
library(sp)
data(meuse)
#Examin data
print(str(meuse))
print(summary(meuse))
The first plot will be simple points. Note that we must first establish which variables to use as coordinates.
%%R
#Set coordinates
coordinates(meuse)<-c('x','y')
#Plot points
print(plot(meuse))
#Set title
title('points')
Note that by establishing a coords slot for meuse (via the coordinates() call), the input data.frame has automatically been converted to a SpatialPointsDataFrame object. It must be remembered that trying to reassign the same coordinates to meuse after this conversion will throw an exception. If the cell must be re-run, the input data.frame must be read in again.
To build a SpatialLines object, the coordinates must be connected in sequence.
%%R
cc<-coordinates(meuse)
print(cc[1:10,])
Recall the class structure. The lines slot in a SpatialLines object is a list of Lines objects, which are themselves lists of Line objects. We must take each hierarchical level into account in building the SpatialLines object from a coordinate array. Additionally, each Lines object must be given a valid ID (hence the '1').
%%R
#Build SL object
m.sl<-SpatialLines(list(Lines(list(Line(cc)),'1')))
#Plot lines
print(plot(m.sl))
#Set title
title('lines')
Now to the interesting data model for our purposes, polygons. For this we are looking at a different set (meuse.riv) holding information for the plotting of a river. Again we have to navigate nested classes to get to a SpatialPolygons object, and again we must provide the objects that sit in the primary spatial slot with a valid identifier. In this case, meuse.riv contains a single polygon.
%%R
data(meuse.riv)
#Build list that sits in polygons slot of SpatialPolygons
meuse.lst<-list(Polygons(list(Polygon(meuse.riv)),'meuse.riv'))
#Build SpatialPolygons object
meuse.sr<-SpatialPolygons(meuse.lst)
#Plot polygon
print(plot(meuse.sr,col='blue'))
#Set title
title('polygon')
Just for the purposes of a composite image, we will run quickly through the grid data construction...
%%R
data(meuse.grid)
coordinates(meuse.grid)<-c('x','y')
meuse.grid<-as(meuse.grid,'SpatialPixels')
image(meuse.grid,col='grey')
title('grid')
Now to build the composite...
%%R
image(meuse.grid, col='tan')
plot(meuse.sr,col='blue',add=TRUE)
plot(meuse,add=TRUE)
We can create panel layouts and customize axes...
%%R
#Generate layout for side by side plots
layout(matrix(c(1,2),1,2))
#Plot each panel
plot(meuse.sr,axes=TRUE)
plot(meuse.sr,axes=FALSE)
#Set the tick intervals and range of the x and y axes
axis(1,at=c(178000+0:2*2000),cex.axis=.7)
axis(2,at=c(326000+0:3*4000),cex.axis=.7)
If we want to show attributes of spatial features, we must use type, size, or color of symbols, lines or polygons to convey information. We will plot the meuse information yet again, even going so far as to use the pixel grid to plot interpolated zinc concentration as a background image.
%%R -w 600 -h 800
library(spatstat)
#print(str(meuse))
#Set interpolation color range for zinc concentration
grays=gray.colors(4,.55,.95)
#Plot zinc concentration
#zn.idw <- idw(log(zinc) ~ 1, meuse, meuse.grid)
#image(zn.idw, col=grays, breaks=log(c(100,200,400,800,1800)))
#Plot river
plot(meuse.riv,add=TRUE)
#Plot points
plot(meuse,pch=1,cex=sqrt(meuse$zinc)/20,add=TRUE)
#Set legend
legVals<-c(100,200,500,1000,2000)
#Create first legend
legend('left',legend=legVals,pch=1,pt.cex=sqrt(legVals)/20,bty='n',title='measured')
#Create another legend
legend('topleft',legend=c('100-200','200-400','400-800','800-1800'),fill=grays,bty='n',title='interpolated')
Moving in a different direction, we can work with areal data as well. Here is an example in which we can actually identify individual polygons...
%%R
library(maptools)
#Set projection
prj<-CRS("+proj=longlat +datum=NAD27")
#Grab shapefile from maptools
nc_shp<-system.file('shapes/sids.shp',
package='maptools')#[1]
print(nc_shp)
#Read shapefile
nc<-readShapePoly(nc_shp,proj4string=prj)
plot(nc)
#Establish coordinate identifier (identify provides labels instead of coordinates)
pt<-locator(type='p') #Not yet clear on sequence of operations here...
print(pt)
To prepare for the lattice plots we seek here, we will need some front end code not included in the book...
%%R
library(gstat)
data(meuse)
#Set coordinates
coordinates(meuse) <- ~x+y
#Convert to grid object
data(meuse.grid)
coordinates(meuse.grid) <- ~x+y
gridded(meuse.grid) <- T
#Introduce kriging (beyond scope of this script, but it's how the interpolation is done)
zn <- krige(zinc~1,meuse,meuse.grid)
zn$direct <- zn$var1.pred
zn$log <- exp(krige(log(zinc)~1,meuse,meuse.grid)$var1.pred)
Now we can get a sense of what lattice provides us with. Note that z is defined by the gridded meuse data (~x+y) from above. We will plot this data with two methods, levelplot and spplot.
With levelplot, it can be seen that the first argument captures the specific data to be mapped (zx+y|name). The vertical bar indicates the desire to have this data (x+y) faceted by a given factor (name). The second argument captures the containing data.frame (spmap.to.lev(zn[c("direct", "log")])). The data are drawn, in this case, from the kriging operation above performed on the gridded meuse data (which resulted in zn). The result was then captured as 'direct', and then a new instance was created that logged the result (captured as 'log'). zn[c("direct", "log")] subsets zn to these two vectors.
Helper function spmap.to.lev converts zn (a SpatialPixelsDataFrame object) to a data.frame that has the grid values for both 'direct' and 'log' in a single column, accompanied by a column of factor labels ('direct' and 'log'). The rest of the arguments set the auxillary characteristics of the plot.
%%R
library(lattice)
print(levelplot(z~x+y|name, spmap.to.lev(zn[c("direct", "log")]), asp = "iso",
cuts=4, col.regions=grey.colors(5, 0.90, 0.50, 2.2)),
split = c(1,1,1,2), more = TRUE)
print(levelplot)
spplot, by contrast, wraps much of this functionality automatically. All that is required is the original data.frame (zn[c("direct", "log")]), and a few parameters indicating the number and type of colors, etc.
%%R
print(spplot(zn[c("direct", "log")], cuts=4,
col.regions=grey.colors(5, 0.90, 0.50, 2.2)), split = c(1,2,1,2))
print(spplot)
ggmap offers an alternative to sp based plotting that relies on the Grammar of Graphics. The drawback is substantially less flexibility with respect to coordinate reference systems. To borrow Hadley Wickham's own words:
"Since ggplot2 is an implementation of the layered grammar of graphics, every plot made with ggplot2 has each of the above elements. Consequently, ggmap plots also have these elements, but certain elements are fixed to map components : the x aesthetic is fixed to longitude, the y aesthetic is fixed to latitude, and the coordinate system is fixed to the Mercator projection."
The upside to fixing these elements is consistent scaling, which turns out to be a very strong positive when faceting.
There are two major steps involved: >1. Use get_map to download images and format them for plotting; and,
- Use ggmap to actually plot.
As far as getting basic spatial information is concern, geocode accepts not only coordinates, but also strings...
%%R
library(ggmap)
print(geocode('the white house'))
Maps are easily brought in centered on a point location, with zoom sitting in the range [3,20]. qmap serves as a convenience mapping function that wraps the finer resolution control of ggmap.
%%R
au<-'american university'
print(qmap(au,zoom=14))
Google Maps is the default pull, but other sources such as OpenStreetMap are also available.
%%R
print(qmap(au,zoom=14),source='osm')
The appearance of the map, or tile style, is driven by the map source. get_map is actually a wrapper function that covers four different sources: >1. Google Maps
The latter two provide the most versatility in tile options. CloudMade, in particular, provides access to thousands of user-created tiles (although you need an API key). There are some pretty cool tile designs in CloudMade (and there is an editor to create your own).
%%R
print(qmap('the white house',zoom = 14,maptype = 53428,api_key='439cf1240ef4468ebb4277d0d31a6570',source = "cloudmade"))
The get_map function actually returns a raster object (like a SpatialPixel object)...
%%R
co<-get_map(location='colorado')
print(str(co))
Note how the data come in coordinate pairs of character values. Each value is a hexadecimal color address. The object carries to main attribute: >1. class = ['ggmap','raster']
- bb (or Bounding Box) has two coordinate pairs, lat/long for the lower left and upper right corners.
ggmap takes this raster object and plots it using the following basic ggplot call of reasonably familiar structure:
"""
ggplot(aes(x=lon, y=lat), data=fourCorners) +
geom_blank() +
coord_map("mercator") +
annotation_raster(ggmap, xmin, xmax, ymin, ymax)
"""
fourCorners here is the data.frame created via the expand.grid function being called on the single row data.frame of bounding box parameters.
%%R
print('***BOUNDING BOX***')
print(attr(co,'bb'))
print(cat('\n'))
print('***EXPAND.GRID(BOUNDING BOX)***')
print(expand.grid(lon=c(attr(co,'bb')$ll.lon,attr(co,'bb')$ur.lon),lat=c(attr(co,'bb')$ll.lat,attr(co,'bb')$ur.lat)))
Additional arguments include the following: >1. extent - Captures how much of the graphics device is covered by the map. Options are ['normal','device','panel']. 'device' is default and fills the entire graphics device without axes. 'panel' fills the device with axes. 'normal' leaves axis padding.
- base_layer - The default base layer is ggplot(aes(x=lon, y=lat), data=fourCorners). This argument allows one to substitute a new base layer, which is generally required for faceting because facet_wrap and facet_grid work of said layer. (Setting maprange=TRUE ensures that the map itself determines the x and y axis limits.)
- legend - Determines legend location ['left','right','bottom','top','topleft','bottomleft','topright','bottomright','none']
- padding - Determines distance of legend from corner when on top.
- darken - tints the map image on the [0,1] interval.
Note that ggmap returns a ggplot object, which can be used as a base layer. This enable smalle tweaks from additional parameters in subsequent plots.
We will now get into an example provided by Hadley to demonstrate the functionality. It examines pre-geocoded violent crime in my hometown of Houston, TX.
%%R
#Examine crime data (taken from the ggmap library)
str(crime)
We need a reasonable extent, and we can find this via trial and error by zooming. The critical feature, however, is using gglocator to grab the bounding box coordinates of our desired extent. It works much like the locator functions used with sp, and is applicable for any ggplot2 object. The kicker, however, is that it's only effective in interactive sessions because it requires manual intervention. gglocator defaults to one click on the map (like locator), and passing an argument of 2 permits two clicks. One click yields a point, two yields a rectangle which can be used as the bounding box.
%%R
#Hadley already figured out a desirable extent
print(qmap('houston',zoom=13))
#Use gglocator to grab bounding box data
#gglocator(2)
#Look at gglocator function
print(gglocator)
Now to massage the data a bit, we need only violent crimes in the desired area. It will also be useful to provide an ordinal ranking of severity. In order of increasing severity: >1. robbery
- aggravated assault
- rape
- murder
%%R
#Subset to violent crimes
violent_crimes <- subset(crime,
offense!='auto theft' & offense!='theft' & offense!= 'burglary')
#Restrict to only the downtown area (the bounding box would have been determined by gglocator in Rstudio rather than Notebook)
violent_crimes <- subset(violent_crimes,
-95.39681<=lon & lon<=-95.34188 & 29.73631<=lat & lat<=29.78400)
#Provide ordinal ranking for crimes
violent_crimes$offense<-factor(violent_crimes$offense,
levels=c('robbery','aggravated assault','rape','murder'))
Now we can actually map the violent crime data using one of the many themes at ggmap's disposal.
%%R
#Set theme
theme_set(theme_bw(16))
#Generate base map
HoustonMap <- qmap("houston", zoom = 14, color = "bw", legend = "topleft")
#Incorporate data
print(HoustonMap + geom_point(aes(x = lon, y = lat,colour = offense, size = offense),data = violent_crimes))
Not bad looking, but it is point data, so Hadley uses this as an opportunity to deal with overplotting via binning. One simple way of doing it is to ignore frequency and just create a map of sectors demonstrating where crime is happening. This method also obscures an understanding of multiple types of violent crime in a given areas. It does, however, have the virtue of being easier to see.
More importantly, this plot demonstrates that we can reformulate the data to provide a different view within the plotting script. This is the advantage that ggplot brings over matplotlib.
%%R
print(HoustonMap + stat_bin2d(aes(x=lon,y=lat,colour=offense,fill=offense),
size=.5,bins=30,alpha=.5,data=violent_crimes))
As noted before, we ignored frequency in creating the binned image. A good way to focus on frequency would be to aggregate all violent crimes and examine the gradient of incidence.
%%R
#Create a new map of Houston
houston<-get_map('houston',zoom=14)
#Create new base map
HoustonMap<-ggmap(houston, extent='device', legend='topleft')
#Generate density plot
print(HoustonMap + stat_density2d(aes(x=lon,y=lat,fill=..level..,alpha=..level..),
size=2,bins=5',data=violent_crimes,geom='polygon'))
#print(violent_crimes$..level..)
The '..level..' parameter seems to capture the aggregate count. It does this by capturing the height of the density value. If we need to clarify the distinction between the map and the color gradient, an inset can be used.
Finally, we get to faceting. What if we want to see the crime hotspots by day of the week? Below we split the data across days and change the color scheme.
%%R -w 800 -h 1000
#Generate new map of Houston
houston<-get_map(location='houston',zoom=14,color='bw',source='osm')
#Generate new base map
HoustonMap<-ggmap(houston,base_layer=ggplot(aes(x=lon,y=lat),data=violent_crimes))
#Generate plot similar to above + faceting
print(HoustonMap + stat_density2d(aes(x=lon,y=lat,fill=..level..,alpha=..level..),
bins=5,geom='polygon',data=violent_crimes) +
scale_fill_gradient(low='black',high='red') +
facet_wrap(~day))
The contour 'clipping' (particularly visible on Sunday) is a known issue to be fixed in future versions of ggplot2. However, it does not obscure the hotspot identification.
ggmap has a utility function called geocode which seeks to cut out the middle man in translating features and addresses to mappable locations. There are three options ['latlon','more','all'] that provide increasingly detailed information from the JSON tree retrieved from Google's Geocoding API.
%%R
#Simple output
print(geocode('1101 4th St. SW, Washington DC 20024',output='latlon'))
cat('\n')
#More detail
print(geocode('1101 4th St. SW, Washington DC 20024',output='more'))
cat('\n')
#Entire JSON tree (commented out because it's really long)
#print(geocode('1101 4th St. SW, Washington DC 20002',output='all'))
revgeocode goes the other way, converting lon/lat into physical addresses...
%%R
ocfo<-geocode('1101 4th St. Southwest, Washington DC 20024')
ocfo<-as.numeric(ocfo)
print(revgeocode(ocfo))
Google also has a Distance Matrix API that provides the travel distance and time between two locations for driving, bicycling, and walking modes of transportation. Initially, I was concerned that this was not quite as useful as a straight line distance between polygon centroids for my purposes, but I have now considered that economic resources must use some mode of transportation. Perhaps it's straight line distances that are more unrealistic...
%%R
#Enter origins and destinations
from<-c('1101 4th St. Southwest, Washington DC 20024','the white house','22 Bryant St. NE, Washington DC 20002')
to<-c('the white house','22 Bryant St. NE, Washington DC 20002','1101 4th St. Southwest, Washington DC 20024')
#Query distances
print(mapdist(from,to))
As can be seen, the output is a convenient data.frame. The default mode is driving, and as with geocoding, varying levels of detail are available via an output argument.
ggmap has simplified the process of working with shapefiles, and eliminated the need to use a commercial GIS system to do so. The basic method involves first reading in a shapefile as a SpatialPolygonsDataFrame object, and then converting it to a data.frame via a function called fortify. Creating areal maps is as easy as merging data with the 'fortified' data.frame. Additional layers can be added by appending more geom layers.
%%R
library(maptools)
library(gpclib)
library(sp)
#Note that we are reading in this data with sp into a SpatialPolygonsDataFrame
DCshp<-readShapeSpatial('/home/choct155/Google Drive/Dissertation/Data/spatial/DC/tl_2012_11_tract.shp',proj4string=CRS('+proj=longlat +datum=WGS84'))
#print(DCshp)
#Convert to r data.frame
dc<-fortify(DCshp)
print(head(dc))
cat('\n')
#Create base map
dcmap<-qmap('washington dc',zoom=11,maptype='toner',source='stamen')
print(dcmap)
#Combine census tract data with base map
print(dcmap + geom_polygon(aes(x=long,y=lat,group=group),
data=dc,colour='white',fill='red',alpha=.5,size=.2))